# Efficient Quantization
Deepseek Ai DeepSeek R1 Distill Qwen 14B GGUF
DeepSeek-R1-Distill-Qwen-14B is an optimized large language model with a parameter scale of 14B, released by DeepSeek AI. It is distilled from the Qwen architecture and offers multiple GGUF quantization versions to improve performance.
Large Language Model
D
featherless-ai-quants
237
1
Medra27b I1 GGUF
Apache-2.0
A quantized version of Medra27B, offering multiple quantization types, suitable for multiple fields such as text generation and medical artificial intelligence.
Large Language Model
Transformers Supports Multiple Languages

M
mradermacher
337
0
Nvidia Llama 3.1 Nemotron Nano 4B V1.1 GGUF
Other
A quantized version of the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model, processed with llama.cpp tools for various quantization methods, suitable for running in resource-constrained environments.
Large Language Model English
N
bartowski
2,553
8
Seed Coder 8B Instruct GGUF
MIT
This model has undergone self-quantization processing, with output and embedding tensors quantized to f16 format, and the remaining tensors quantized to q5_k or q6_k format, resulting in a smaller size while maintaining performance comparable to pure f16.
Large Language Model English
S
ZeroWw
434
1
Andrewzh Absolute Zero Reasoner Coder 7b GGUF
Llamacpp quantized version based on andrewzh's Absolute_Zero_Reasoner-Coder-7b model, supporting multiple quantization levels, suitable for reasoning and code generation tasks.
Large Language Model
A
bartowski
1,325
5
Qwen3 14B AWQ
Apache-2.0
Qwen3-14B-AWQ is the latest 4-bit AWQ quantized version of the Qwen series large language model, supporting seamless switching between reasoning and non-reasoning modes with powerful inference, instruction following, and agent capabilities.
Large Language Model
Transformers

Q
Qwen
15.17k
21
Mlabonne Qwen3 4B Abliterated GGUF
Quantized version of Qwen3-4B-abliterated, quantized using llama.cpp, supports multiple quantization types, suitable for text generation tasks.
Large Language Model
M
bartowski
3,623
3
Qwen Qwen3 1.7B GGUF
A quantized version based on Qwen/Qwen3-1.7B, using llama.cpp for quantization, supporting multiple quantization types, suitable for text generation tasks.
Large Language Model
Q
bartowski
7,150
10
Dreamgen Lucid V1 Nemo GGUF
Other
A quantized model based on dreamgen/lucid-v1-nemo, processed with llama.cpp for various quantization levels, suitable for text generation tasks.
Large Language Model English
D
bartowski
6,593
5
Gemma 3 12b It GGUF
Gemma 3 12B is a large language model that provides a quantized version in GGUF format, suitable for local deployment and use.
Large Language Model
Transformers

G
tensorblock
336
1
EXAONE Deep 2.4B AWQ
Other
The EXAONE Deep series models excel in reasoning tasks such as mathematics and programming. This model is the 4-bit AWQ quantized version with 2.4 billion parameters
Large Language Model
Transformers Supports Multiple Languages

E
LGAI-EXAONE
751
16
Thedrummer Gemmasutra Small 4B V1 GGUF
Gemmasutra-Small-4B-v1 is a 4B-parameter text generation model, quantized based on llama.cpp, suitable for various quantization version choices.
Large Language Model
T
bartowski
583
2
Internvl2 5 4B AWQ
MIT
InternVL2_5-4B-AWQ is the AWQ quantized version of InternVL2_5-4B using autoawq, supporting multilingual and multimodal tasks.
Image-to-Text
Transformers Other

I
rootonchair
29
2
Ozone Ai 0x Lite GGUF
Apache-2.0
Quantized version based on ozone-ai/0x-lite model, supporting Chinese and English text generation tasks, using llama.cpp for imatrix quantization, offering multiple quantization options to adapt to different hardware requirements.
Large Language Model Supports Multiple Languages
O
bartowski
220
2
Thedrummer Gemmasutra 9B V1.1 GGUF
Other
This is a quantized version based on TheDrummer/Gemmasutra-9B-v1.1 model, processed using llama.cpp, suitable for text generation tasks.
Large Language Model
T
bartowski
1,198
6
Mt0 Xxl Mt Q4 K M GGUF
Apache-2.0
This model is a multilingual text generation model converted from bigscience/mt0-xxl-mt to GGUF format via llama.cpp, supporting various language tasks.
Large Language Model Supports Multiple Languages
M
Markobes
14
1
Summllama3.1 8B GGUF
An 8B-parameter summary generation model optimized based on Llama3 architecture, offering multiple quantization versions
Large Language Model
S
tensorblock
52
0
FLUX.1 Schnell GGUF
Apache-2.0
FLUX.1-schnell is an efficient text-to-image generation model based on a diffusion model architecture, supporting English text input to generate high-quality images.
Text-to-Image English
F
second-state
551
11
Phi 3.5 Mini Instruct Uncensored GGUF
Apache-2.0
Phi-3.5-mini-instruct_Uncensored is a quantized language model suitable for use under various hardware conditions.
Large Language Model
P
bartowski
1,953
42
FLUX.1 Schnell Quantized
Apache-2.0
Quantized version of FLUX.1-schnell, a text-to-image diffusion model supporting multiple quantization precision options
Text-to-Image English
F
aifoundry-org
491
7
Bge M3 GGUF
MIT
This model is a sentence similarity model converted from BAAI/bge-m3 to GGUF format using llama.cpp via ggml.ai's GGUF-my-repo space.
Text Embedding
B
bbvch-ai
266
1
Chronos T5 Tiny
Apache-2.0
Chronos is a family of pretrained time series forecasting models based on language model architectures, trained by quantizing and scaling time series into token sequences.
Climate Model
Transformers

C
amazon
573.84k
106
Chronos T5 Base
Apache-2.0
Chronos is a family of pre-trained time series forecasting models based on language model architecture, which transforms time series into token sequences for training to achieve probabilistic forecasting.
Climate Model
Transformers

C
amazon
1.4M
30
Llava V1.6 34B Gguf
Apache-2.0
LLaVA 1.6 34B is an open-source multimodal chatbot model developed by fine-tuning a large language model on multimodal instruction-following data. It supports image-to-text and text-to-text generation tasks.
Image-to-Text
L
cjpais
1,965
40
Featured Recommended AI Models